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A code is defined by the nature of the symbols, which are used 
to generate information-storing combinations (e.g. oligo- and 
polymers). Like nucleic acids and proteins, oligo- and poly- 
saccharides are ubiquitous, and they are a biochemical platform 
for establishing molecular messages. Of note, the letters of the 
sugar code system (third alphabet of life) excel in coding 
capacity by making an unsurpassed versatility for isomer (code 
word) formation possible by variability in anomery and linkage 
position of the glycosidic bond, ring size and branching. The 
enzymatic machinery for glycan biosynthesis (writers) realizes 
this enormous potential for building a large vocabulary. It 
includes possibilities for dynamic editing/erasing as known 
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from nucleic acids and proteins. Matching the glycome 
diversity, a large panel of sugar receptors (lectins) has 
developed based on more than a dozen folds. Lectins ‘read’ the 
glycan-encoded information. Hydrogen/coordination bonding 
and ionic pairing together with stacking and C-H/7r-interactions 
as well as modes of spatial glycan presentation underlie the 
selectivity and specificity of glycan-lectin recognition. Modular 
design of lectins together with glycan display and the nature of 
the cognate glycoconjugate account for the large number of 
post-binding events. They give an entry to the glycan 
vocabulary its functional, often context-dependent meaning(s), 
hereby building the dictionary of the sugar code. 


1. Introduction 


"To an observer trying to obtain a bird's eye view of the present 
state of biochemistry - life may until very recently have seemed 
to depend on only two classes of compounds: nucleic acids and 
proteins." They are connected by the genetic code. The 
sequence of three symbols (letters) of the first alphabet of life 
(nucleotides) stands either for an amino acid (one of the letters 
of the second alphabet of life) or a stop signal so that nucleic 
acid becomes the template for protein biosynthesis. In this 
special (though fundamental) case of using the term 'code' in 
life sciences, the information stored in a (nucleotide) sequence 
has the biological meaning of a sequence of a protein: the 
dictionary for the 64 entries of the vocabulary of trinucleotides 
provides their translation into an amino acid or a stop signal. In 
other cases of using the term 'code', the information encoded 
in combinations of biochemical symbols (the molecular mes- 
sages or code words) is ‘understood’ (decoded) by a ‘reader’ 
(receptor). It then initiates the translation of this information 
into biofunctionality by post-binding (‘reading’) events, and 
here sugars come into play. That they have for example been 
assumed to be the letters of “a potential carbohydrate 
“language” involved in intercellular interactions"? or a molec- 
ular basis of the cell-surface code”! illustrates their status as 
third alphabet of life. 

Originating from the analytical milestones of the identifica- 
tion of the biochemical nature of (snail) mucin as glycoprotein™ 
and of "Glycosamin" (N-acetylglucosamine, GlcNAc) as building 
block of the polysaccharide chitin the focus of work on 
glycoproteins continued to be on elucidating structural and 
synthetic aspects for a long time. This situation changed after 
having documented the abundance, ubiquitous presence and 
structural diversity of glycans on cell surfaces (and also in a 
polysaccharide-rich (“sugary”) coating termed glycocalyx) and 
after having realized the enormous potential of the described 
structural complexity of oligosaccharides for information 
coding.?? Glycocompounds obviously appeared to have more 
talents than to store energy and to be a molecular concrete for 
cell wall stability. The catchword summarizing the resulting 
hypothesis of their involvement in cellular processes on a broad 
scale as molecular messages simply connected 'sugar' and 
'code'. Historically, serendipity (of local vicinity of two labs) 
helped to do so. At the time when the genetic code was 
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cracked, the term 'sugar code' was suggested by drawing the 
following analogy: "just as Marshall (Nirenberg) was working on 
a nucleic acid code that determined the structure of proteins, 
Vic(tor) Ginsburg [a member of the Gordon Tomkins laboratory, 
whose lab was across the hall from Marshall's at NIH Building 
10] believed there is a sugar code that determines intercellular 
interactions" (p. 25).?! 

When applying the code concept to glycans, the first step 
on the way to prove that there is every reason to believe in 
their position in the flow of biological information is to explain 
why "carbohydrates are ideal for generating compact units with 
explicit informational properties"? Using these biochemical 
symbols, a sophisticated system of 'writers', together with 
‘editors & erasers’, leverages sugars to generate the large 
vocabulary of the carbohydrate language (the glycome)."” 
Since carbohydrates are equipped with ample chemical means 
for molecular recognition such as their hydroxyl groups (the 
stoichiometric proportion of C,(H;O), with nm in hexoses 
and the etymological roots for carbon (Latin ‘carbo’ = coal) and 
water (Greek ‘hydor’) explain the origin of the term ‘carbohy- 
drate’; for further etymological information, please see 
Ref. [11]), the ‘reading’ of ‘words’ written in sugar is easy. Tissue 
receptors (lectins) are a link between the glycan-encoded 
information and the actual process such as cell adhesion so that 
a primer to mammalian lectins will follow. 

It starts by highlighting their diversity on the level of protein 
folds. Once a peptide fold has acquired ability to bind sugar, 
this structure is a starting point for ensuing evolutionary 
diversification. Variations in lectin sequence and modular 
design as well as selectivity and specificity of their pairing with 
cellular glycoconjugates underlie the translation of the glycan- 
encoded message into a distinct cellular response. The long- 
term aim of work on the sugar code, i.e. to compile a dictionary 
for the vocabulary (listing the functional meaning(s) for glycan 
words), is finally sketched by describing biomedical activity of 
glycan-lectin recognition exemplarily (in its cellular context). 
Since “only in recent years have we begun to appreciate how 
deeply glycan functions pervade all aspects of organismic 
biology, molecular biology, and biochemistry"/'? this introduc- 
tion to the concept of the sugar code can be of interest for a 
broad readership. 


2. Letters of the Sugar Alphabet 


Prebiotic conditions on earth are generally assumed to have 
allowed the synthesis of glyceraldehyde and its keto-tautomer, 
which then formed ketohexoses by aldol condensation."?! 
Lobry de Bruyn rearrangement led to the aldohexoses D- 
glucose (Glc) and D-mannose (Man), primitive metabolism 
then to D-galactose (Gal), which have no or just one 1,3-diaxial 
interaction, so that their presence in polysaccharides and 
eukaryotic glycans is thermodynamically favored (the mystery 
why Glc is not present in glycans of mature glycoproteins will 
be solved below) D? The structures of often used carbohydrate 
letters are shown in Supporting Information, Figure S1. 
Intriguingly, with Gln as donor, Glc can be converted to the 
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amino sugar glucosamine (GIcN), which like GalN is then N- 
acetylated to yield GICNAc and GalNAc (Figure S1). A biochem- 
ical letter (see also below for the case of 5-methylcytosine) is 
specifically modified, through which the collection of symbols 
(alphabet of life) is extended in number. The impact of a 
modification on a letter's meaning is nicely demonstrated by 
the following analogy: these derivatives can be considered as 
the equivalent of an Umlaut in the German language used for 
the letters A (i.e. A), O (i.e. Ó) and U (i.e. Ü). 

In order to assemble oligo- and polymers, monosaccharides 
- like nucleotides or amino acids — must first be activated, and 
they are most reactive at their anomeric center for this 
purpose."*! Physiologically, the resulting conjugate with a 
nucleoside mono- or diphosphate acts as the glycosyl donor for 
the enzymatic transfer of sugar to an acceptor, and this 
specifically in either the a- or the B-anomeric position. This is 
the first source for structural variability of glycans beyond the 
sequence. The next one is due to the chemical equivalence of 
the other hydroxyl groups besides the one at the anomeric 
center. 

In contrast to making phosphodiester and peptide bonds, 
glycan biosynthesis is not restricted to connecting fixed 
positions between donor and acceptor. Instead, it can engage 
more than one hydroxyl group of a sugar used as an acceptor 
when an oligosaccharide is built in steps by glycosyltransfer- 
ases. To give an example, a single diglycoside would be 
expected as sole product when a donor-acceptor pair is linked 
in nucleic acid/protein style. 

The possibility of the enzymatic transfer of a sugar to more 
than one acceptor site, however, let it become clear that more 
than a single product will be obtained, and we illustrate this 
principle in a figure. Using L-fucose as a graphic example, its 
naturally occurring a1,2-, a1,3-, a1,4- or a1,6-linkages are drawn 
in Figure 1 (top panel) Impaired fucosylation, to underline 
clinical significance, is the cause of a leukocyte adhesion 
deficiency (LAD II/CDG llc) and, in mouse models with 
engineered deficiencies in a1,3- or a1,6-fucosylation, of disor- 
ders in leukocyte trafficking (by lowering production of ligands 
for cell adhesion molecules, i.e. selectins; see below) or 
diminished growth factor signaling, respectively." To showcase 
the various positions of Fuc residues in natural glycans, the 
structures of the histo-blood group ABH(0) and Lewis (Le*/Le’) 
determinants and of the N-glycan stem with its core fucosyla- 
tion are presented in the bottom panel of Figure 1 (please note 
that the presence of Fuc in the blood-group H(0) epitope and 
its property as ligand were the prerequisites to demonstrate 
inhibition of hemagglutination by a sugar, here a derivative of 
Fuc: see below). 

Inspecting the structures of the oligosaccharides shown in 
Figure 1 now closely makes evident that — in contrast to nucleic 
acid and proteins - a branch is installed into a linear glycan by 
fucosyltransferases (a second example for branching by glyco- 
syltransferases is shown in Supporting Information, Figure S2 
and is explained below) DT Moreover, the Fuc moiety can also 
be transferred from its donor (GDP-Fuc) directly to the hydroxyl 
of serine or threonine in O-fucosylation of epidermal growth 
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Figure 1. Illustration of the four routes to transfer the Fuc moiety from its GDP-Fuc donor (GDP given as X) to the 2, 3, 4, or 6 position of glycan acceptors by 
mammalian fucosyltransferases (top panel), examples of resulting oligosaccharides in N- and O-glycans that define the R' position are presented in the bottom 
panel. This part shows glycans with a1,2-fucosylation (histo-blood group ABH(0) epitopes), with a1,3-fucosylation ((sialyl) Le”), with «1,4-fucosylation (Le?) and 
with a1,2/4-fucosylations (Lei) as well as the N-glycan stem with a1,6-fucosylation termed core fucosylation (examples for lectins that bind the respective 


structure are named. 


factor (EGF)-like and thrombospondin type 1 repeats in the 
endoplasmic reticulum "1? 

Overall, our case study of fucosylation thus teaches the 
lesson that the chemical properties of monosaccharides make 
activation at the anomeric center and natural variability of 
where to add the sugar to an acceptor possible. Also in contrast 
to nucleic acids and proteins, branching is common in glycans. 
On the side of the enzymes, the availability of a group of 
acceptor site-specific glycosyltransferases for each letter of the 
sugar alphabet, the fucosyltransferase family consisting of 13 
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members in mammals,'”"® ensures to realize the enormous 


inherent potential of carbohydrates to yield glycan diversity. 
Since the enzymatic apparatus for glycan biosynthesis with a 
total of at least 167 glycosyltransferases has developed the 
required complexity during evolution to prepare many more 
than a few isomers like writing with letters of an alphabet does, 
coding by glycans will reach the comparatively highest capacity. 
Clearly, it would mean missing manifold opportunities if doing 
so were without physiological significance. It is thus fair to 
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conclude that it is making a snap judgement when under- 
estimating sugars as code symbols. 

Intriguingly, even a further structural feature has been 
detected that increases diversity among glycans, i.e. the ring 
size (5-membered furanose vs 6-membered pyranose). The 
frequent presence of galactofuranose (Galf, shown in Figure S1) 
in polysaccharides and glycoconjugates of bacteria, fungi and 
parasitic protozoa is a proof-of-principle case"? Its profile of 
distribution in Nature therefore predestines the occurrence of 
the five-membered ring for Gal as an indicator of non-self origin 
of a complex carbohydrate, to which host defense can be 
directed (see below). Similarly, O-methylated Man/Fuc residues 
resulting from S-adenosylmethionine-dependent modification 
(as in methylation of cytosine or histones) offer such a target 
site, because they are absent in mammals.2” This qualitative 
difference nourishes the expectation for an exploitation of this 
sugar-based trait in host defense, too (see below). 

In summary, adding the ring size to the status of anomery 
and to the ability of all hydroxyl groups as possible acceptor 
incl. the frequent occurrence of branching accounts for the 
unsurpassed level of structural permutations within glycans 
among biomolecules. When expressing the information-storing 
potential of the sugar code in numbers, i.e. the coding capacity, 
a set of six letters from the sugar alphabet (shown in 
Supporting Information, Figure S1) will theoretically build 1.05 x 
10"? linear and branched hexasaccharides, quite favorably 
comparing in pool size to the 6.4x 107 hexapeptides from 20 
amino acids?" and there is more. Evolution has developed 
even more means to further increase the coding capacity by 
biochemical symbols. This is done by introducing biochemical 
modifications after the assembly of oligo- and polymers, that is 
post-synthetically. 

This natural diversification strategy is common to all three 
types of biomolecular alphabet. Diverse types of substitution of 
a basic structure like the addition of a phosphate (in two steps) 
or a sulfate are known to occur in letters of the sugar alphabet, 
as diverse as they are for example known from nucleotides," 
and they are clinically relevant as the case of deficiency in Man 
phosphorylation as cause of the I-cell disease (mucolipidosis II) 
underscores.?? The initial placement of a modification into a 
glycan can be followed by further enzymatic processing. As can 
happen with 5-methylcytosine, the fifth letter in DNA, by 
hydroxylation, the methyl group of an N-acetyl substituent can 
similarly be oxidized to the hydroxylmethyl (in N-acetylneur- 
aminic acid to yield the N-glycol(o)yl as shown in Figure S1 
(bottom), one of up to nearly 50 ways to create a sialic acid 
from this parental compound). 

In general, post-synthetic modifications give letters a new 
meaning. Phosphorylation (in the 6-position of Man labeling 
glycans of lysosomal enzymes) or sulfation (at the 4-position of 
a branch-end GalNAc in N-glycans of distinct glycoproteins such 
as certain pituitary glycohormones (LH, TSH) or at the 3-position 
of the sulfatides’ Gal headgroup) at specific sites in glycans 
have been likened to a postal-code writing for transport 
processes (see below). It now becomes clear why carbohydrates 
had rightly been judged to be "ideal" for this purpose." 


ChemBioChem 2021, 22, 1-25 www.chembiochem.org 


These are not the final page numbers! 77 





Ironically, exactly this property had been responsible to 
slow down progress of research. "In this remarkable age of 
genomics, proteomics, and functional proteomics, | am often 
asked by my colleagues why glycobiology has apparently 
lagged so far behind the other fields. The simple answer is that 
glycoconjugates are much more complex, variegated, and 
difficult to study than proteins or nucleic acids."?* Interestingly, 
this already holds true for individual letters: the elucidation of 
the structure of the cited N-acetylneuraminic acid, for which a 
total of 11 structures were proposed over time, took 25 years." 
After having surveyed the structural basis for reading high-level 
versatility within the sugar code, we now move from the 
alphabet of sugar letters to the vocabulary of glycans. 


3. The Vocabulary of the Sugar Code 


The presented proof-of-principle case of fucosylation has 
illustrated the existence of an elaborate system for enzymatic 
assembly to turn the described potential of sugars for structural 
glycan diversity into reality. The members of the team of the 
sets of glycosyltransferases with their genuine specificities for 
donor and acceptor pairs as well as for the status of anomery (a. 
or f), for linkage positions and for ring size are called the 
writers. Products of glycogenes for example for sugar activation 
and transport assist and feed the assembly line. Fluctuations in 
the status of substrate and enzyme availabilities will dynam- 
ically modulate characteristics of the product panel, as for 
instance work with toxin (lectin)-resistant cell mutants and 
detection of compensatory responses within the glycome to an 
engineered deficiency for a  glycosyltransferase in vivo 
revealed.?? Writing proceeds in principle in a stepwise manner 
to generate linear and branched glycans. Chain elongation can 
produce repeats of a building block, i.e. the N-acetyllactos- 
amine (LacNAc) unit building oligo- or polyLacNAc sequences in 
(B-1,6-linked) N- and (core 2/4) O-glycan branches or the 
glycosaminoglycan keratan sulfate; branched structures reach 
an up to penta-antennary design in the case of the complex- 
type N-glycans (the names of the six involved £1,2/4/6-GIcNAc 
transferases (GnTs) along with product designations after 
sequential GIcNAc transfer to the N-glycan core pentasaccharide 
are graphically displayed in Supporting Information, Figure S2; 
the 11 GnTs for the p1,3-linkage are involved in other pathways 
of glycan biosynthesis; see below) P Carbohydrate chemistry 
has succeeded to develop elegant strategies for production of 
such structures, as exemplarily shown for a LacNAc dimer 
(DiLacNAc) in Figure 2 (top part; for details on synthesis, please 
see Supporting Information, Scheme S1). Such synthetic oligo- 
saccharides can then be used for interaction analysis such as 
calorimetry or spectroscopy (please see below). 

Alternatively, such glycan derivatives can be conjugated to 
proteins and dendrimer scaffolds. What started with neo- 
glycoproteins used as antigens has led to diverse applications 
of the products as sensors for the presence of sugar receptors 
and the elucidation of their special binding properties.?? This 
way, biomimetics of cellular glycoconjugates with up to multi- 
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Figure 2. Overview on DiLacNAc synthesis (for details, please see Supporting Information, Scheme S1) and the calorimetric titration profiles of its interaction 
with human galectin-3 in H;O (bottom, left) and D,O (bottom, right). For details, please see Ref. [63e]. 


antennary N-glycans or of the clustered appearance of O- 
glycans in mucins are obtained. 

Examples of a trivalent glycocluster and of a (starburst) 
16mer glycodendrimer are presented in Figure 3 (details given 
in Supporting Information, Scheme S2). Synthetic glycoclusters 
and glycodendrimers are valuable tools to answer questions on 
the relevance of topological features of glycan presentation for 
their biological meaning, and, therefore, their successful 
application spurs continued vigorous synthetic efforts (for work 
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to coin the common term 'glycoside cluster effect', please see 
below) D with glycodendrimers at hand, not only the natural 
branching of glycans in glycoconjugates can be mimicked but 
models for experimental study can be brought to the level of 
cell surface (microdomain)-like glycoconjugate presentation. 
The synthesis of amphiphilic Janus glycodendrimers, capable to 
self-associate to various types of nanoparticles as glycosphingo- 
lipid assembly underlies the classical liposomes, paved the way 
to prepare fully surface-programmable vesicle-like models to 
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Figure 3. Illustrations of the syntheses of a triiodobenzene-based trivalent glycocluster (top) and of a 16mer starburst glycodendrimer (bottom). For further 


information on the syntheses, please see Supporting Information, Scheme S2 and the original reports with details on results of lectin assays. 


[96] 
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systematically study bridging phenomena by sugar receptors in 
a bottom-up manner Di 

After the initial writing process, specific letters within 
glycans can be modified by the equivalents of editors, as 
already indicated above: sulfotransferases for respective N- and 
O-substitutions or an epimerase for converting D-glucuronic 
acid into L-iduronic acid in glycosaminoglycans belong to this 
group (see Figure S1 and below). These enzymes can cooperate 
to implement the enormous diversity in the disaccharide unit of 
the glycosaminoglycan chains of proteoglycans. Creating differ- 
ent patterns of substitutions is an intriguing strategy to let this 
simple structural platform acquire complexity, as shown in 
Supporting Information, Figure S3.?? In space and time, glycan 
structures and their modification patterns do not remain 
unchanged. Erasers remove added groups from the 'message' 
up to the size of a letter, this for example seen for sialic acids 
from oligosaccharides of distinct gangliosides such as GD1a 
upon cell activation or differentiation (see below) and for Glc 
and Man residues during the maturation of N-glycans in the 
endoplasmic reticulum.?? In appealing analogy to the chain of 
events when shaping the vocabulary of nucleic acids or 
proteins, writers intimately team up with editors and erasers to 
increase sequence variability and add dynamic reshaping to 
information coding. Cycles of post-synthetic modification and 
removal of substitutions by coordinated editor & eraser 
activities thus are a hallmark of all three coding platforms in the 
flow of biological information. The case study of the events to 
shape the histone code highlights these principles?" The 
fundamental lesson thus is that fine-tuning of vocabularies by 
post-synthetic processing is a common feature of biochemical 
codes that are based on each of the three alphabets of life. 

Has the analytical technology to define the glycan vocabu- 
lary (glycome) reached the necessary level to perform its 
detailed mapping on the level of cells? Starting from the 
stepwise characterization of cellular glycans and of the parts of 
the enzymatic machinery to generate them, global profiling of 
the products of glycosylation pathways of wild-type and 
genetically engineered (for glycogenes) eukaryotic cells has 
indeed been achieved." Glycan analysis at the glycome level 
(for recent review on experimental approaches, please see 
Ref. [36] enables respective profile monitoring at high-level 
sensitivity, big-data glycomics then leading to its integration 
into systems biology." 

The step from detection and characterization of glycans to 
their localization in cells and tissues is facilitated by cyto- and 
histochemistry using cells and sections as assay platform. The 
initial approach of monitoring sugar presence by performing 
chemical visualization protocols such as the periodic acid-Schiff 
stain? has been replaced by using sugar receptors, and this 
with a considerable gain of specificity. Figure 4 documents how 
distribution profiles of distinct glycan determinants in sections 
of fixed and paraffin-embedded tissues look like. The systematic 
application of this technique has revealed clearly non-uniform/- 
random patterns of presence of glycan determinants with 
spatiotemporal dynamics of expression.” Fine-tuned regula- 
tory mechanisms on genetic, epigenetic, transcriptional and 
post-transcriptional/-translational levels in combination with 
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Figure 4. Lectin histochemical localization of glycans in sections through 
retina and anterior segments of fixed adult chicken eyes. Detection of the 
mucin-type O-glycan core 1 disaccharide (TF antigen) in retina's photo- 
receptor layer (inset: inhibition control with cognate sugar) (A), of LacNAc 
oligomers in connective tissue and epithelial cells of the ciliary body (B), of 
02,3 (C)- or 02,6 (D)-sialylated N-glycans in immune cell aggregates of 
Haderian gland, of 1,6-branched N-glycans between lens fibers (E) and of f- 
galactosides (bound by the labeled chicken galectin CG-1B) in corneal 
epithelium (inset: inhibition control with cognate sugar) (F). Scale bars: 

20 um (Reproduced with permissions from Ref. [97b] Copyright 2017 John 
Wiley and Sons and from Ref. [97c] Copyright 2018 Elsevier; for technical 
details and information on the lectins used as tools, please see Ref. [97]). 


the noted factor acceptor/donor availabilities turn the large 
potential for glycan diversity into reality, such mechanisms 
recently unpicked in the cases of the two a2,6-sialyltransferases 
(an example for detection of a2,6-sialylated N-glycans by a 
fungal lectin is shown in Figure 4D).? Swift coordinated 
reactions to external factors, e.g. a stressor such as tunicamycin 
that blocks the route toward N-glycosylation at its first step, 
here to safeguard homeostasis by the unfolded protein 
response," support the conclusion of a broad-scale physiolog- 
ical significance of protein glycosylation, as emphasized by an 
explicit statement from the literature given above." Looking at 
the transcriptional regulation, the multitude of permutations of 
individual control elements for expression of glycogenes is 
comparable to what a multi-dimensional switchboard can 
achieve and a challenge for explorations. In our context, it is 
imperative to underline that specific glycan-protein recognition 
underlies this method for detecting saccharides and hereby 
monitoring spatiotemporal expression patterns. The letters of 
the sugar alphabet are well-suited to make this interaction 
selective and specific because they offer regions of considerable 
size for molecular complementarity, which we will look at next. 
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Hydroxyl groups are readily accessible to establish direc- 
tional hydrogen or coordination (with Cal" in proteins) 
bonding. The position of each substituent is checked in this 
process for complementarity: if either the equatorial hydroxyl of 
Glc(NAc)/Man or the axial hydroxyl of Gal at the C4-position 
(see Supporting Information, Figure S1) together with a second 
OH group is engaged for example in coordination bonding (in 
analogy to bidentate H-bonding involving side chains of Arg 
and others), then the lectin's specificity to Man or Gal is readily 
explained. Hydrophobic complementarity is achieved with 
methyl groups present for example in Fuc or GICNAc/GalNAc. 
C—H/z-interactions are possible between the B-face of Gal (that 
presents the slightly polarized CH groups) and the z-electrons 
of a Trp residue. Last but not least, strong ionic pairing brings 
sialic acids and charged sugar derivatives such as sulfated 
epitopes in contact with strategically positioned basic amino 
acids (or H-bonding donors, mostly main chain N—H groups) in 
the receptor's binding site (see below). A network of electro- 
static bonds, for example, will be an efficient molecular brake 
when a leukocyte in the bloodstream needs to be brought to a 
stop to adhere to inflamed endothelium (in the case of the 
three distinct lectins of the C-type family called selectins), will 
be molecular glue in transport processes, e.g. apical or axonal 
sulfatide-dependent glycoprotein routing (for C- and P-type 
lectins or a galectin) or for contact building to trigger outside-in 
signaling (for siglecs). This enumeration underscores the 
potential of glycans to generate affinity, selectivity and 
specificity by different molecular modes of complementarity in 
a binding (‘reading’) process. The inherent requirement for 
mutual docking let us become aware of another favorable 
feature of glycans in information coding and transfer that is 
explained next. 

The thermodynamics of the association of a glycan to a 
receptor would not include a large entropic penalty, if glycans 
had a low degree of intramolecular flexibility around glycosidic 
bonds, and this often is the case. Since the conformational 
space of oligosaccharides then is well-structured like a land- 
scape with energetically privileged valleys, E. Fischer's famous 
lock-and-key analogy can be applied to view these conformers 
as bioactive keys.” Interestingly, the conversion between 
conformers of "the bunch of keys - each of which can be 
selected by a receptor"?! (and differential conformer selection 
is a common phenomenon among lectins) is often a rapid 
process and hereby an impediment to crystallize glycans. When 
a rather rigid oligosaccharide such as sialyl Le* meets a 
preformed docking site in a selectin, association driven by ionic 
interaction will even have a high k,,-value so that we reach the 
following fundamental conclusion: on the level of molecules 
and cells, "a universal biological principle, namely, molecular 
key-lock configuration as a mechanism of selectivity" is 
operative. 

The inherent ability to select, choose or read (legere in Latin) 
was the reason to call (glyco)proteins from plants, which 
agglutinated erythrocytes depending on their blood group ABH 
(0) status, lectins“! (for overviews on lectin history, please see 
Ref. [46]. However, the biochemical nature of what such a 
blood group is had been a mystery, and now we connect to 
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a1,2-fucosylation as indicated above to solve it: when perform- 
ing an inhibition assay as designed for serological reactions, "^? 
the agglutination of blood group H(0)-positive erythrocytes by 
eel (Anguilla anguilla) serum was most effectively blocked by a 
sugar derivative, i.e. a-methyl-L-fucopyranoside. Notably, its B- 
anomer was inactive." The anti-H(0) agglutinin thus is a 
fucose-specific lectin with specificity for the a(1,2)-linkage that 
is shown in the bottom panel of Figure 1. This study confirmed 
and extended previous evidence that a haemagglutinin (in this 
case the glucose/mannose-specific concanavalin A) interacts 
with "some compounds present in the surface of such types of 
erythrocytes as it agglutinates - it is possible that this may be a 
carbohydrate group in a protein"?! The recognition of distinct 
glycan epitopes on the erythrocyte surface and the trans- 
bridging by a phytohemagglutinin, e.g. the blood group A 
tetrasaccharide by Dolichos biflorus agglutinin, seen in such 
assays proves cell-cell adhesion to be established by glycan- 
protein (lectin) recognition, with implications far beyond the 
blood-group typing in transfusion medicine.^? Owing to the 
pioneering detection and the purification of the first lectins 
from mammalian organs by affinity chromatography in 1974 
and 1975,°" it became clear that endogenous lectins are a link 
between the glycan-based vocabulary and its functional 
aspects, i.e. the entries into a dictionary of the sugar language. 
Glycan-based words of this vocabulary can receives their 
meaning(s) by pairing with lectin(s), experimentally detectable 
as read-out when measuring biochemical and cell biological 
post-binding effects (see below). 


4. From the Vocabulary to Readers of the 
Sugar-Encoded Information 


The structural unit of each lectin essential for glycan binding is 
the carbohydrate recognition domain (CRD). The assumption of 
a fundamental role of lectins in cell physiology by interplay 
with cellular glycans would be strongly supported, if not only a 
single type of CRD had developed in evolution. Instead, the 
diversity of the glycan vocabulary described above would much 
better be matched by a large pool size of CRDs. Respective 
analyses on lectin structure, indeed, disclosed that more than a 
dozen protein folds are able to generate a CRD. These folds are 
presented in our gallery of human and animal lectins (Support- 
ing Information, Figure S4). The case of the multi-purpose use 
of the D-sandwich platform (adapted to make contacts to sugar 
ligands at different sites in the fold without/with the involve- 
ment of coordination bonds to protein-bound Ca^*, which help 
to distinguish epimers at high-level accuracy; the Ca?* of the 
laminin G-like domain (mostly no. 4 of the five linearly arrayed 
units) even reaches octahedral coordination with the carbox- 
ylate of GlcA and the 4-OH of xylose (Xyl) of the GlcAB1,3Xyl 
disaccharide of matriglycan??) is a role model: it exemplifies the 
plasticity of a fold to serve as starting point for the develop- 
ment of different groups of fold-sharing lectins. Historically, this 
fold was the first to be detected in a lectin when solving the 
crystal structure of the already mentioned leguminous lectin 
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concanavalin A and later of the first animal lectin (i.e. galectin- 
1, in phytohemagglutinins allowing formation of di- and 
tetramers and the discovery of the importance of the quater- 
nary structure for bioactivity.> 

During phylogenesis, each CRD is then subject to sequence 
diversification after duplication events. In general, they can 
occur within the CRD (establishing more than one binding site 
per fold as for example seen in -propeller lectins),?? for the 
CRD within the gene for a modular protein (establishing a 
tandem-repeat arrangement of CRDs) and on the level of the 
entire gene. Naturally, individual preferences for ligand binding 
will hereby be shaped. As consequence of gene duplication, 
each ancestral CRD is the origin of a family of structurally 
homologous but distinct proteins. Particular sequence motifs, 
e.g. the Glu-Pro-Asn (or Gln-Pro-Asp) triad in the primary Ca?* 
-binding site of C-type lectins for Man/GIcNAc (or Gal/GalNAc) 
binding (programming coordination bonding to select distinct 
epimers) or the seven-amino-acid signature (with its Trp for 
C-H/z-interactions) for ga(lactose-binding)lectins (= galectins) 
govern direct contact building and are thus conserved (see 
below). Sequence variations in their local vicinity then imple- 
ment grading of the fine- and subspecificities so that each 
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member of a lectin family can select its set of binding partners 
among structurally related glycans. To visualize this point, 
Figure 5 provides a graphical account on natural f-galactosides 
to visualize diversity at branch ends of glycans. Of note, the 
entries of galectin names into the figure inform about 
preferences (for surveys on galectin contact sites and specific- 
ities, please see Ref. [55]). 

On the cellular level, the actual features of glycan 
presentation, e.g. defined by branching, clustering or local 
vicinity among glycoconjugates in microdomains, will also have 
a major influence on its bioactivity. For example, multivalency 
of a glycoconjugate counterreceptor such as a glycoprotein 
separates the individual loading steps with lectins into a 
gradient of decreasing binding constants up to reaching full 
saturation so that the first binding process has the highest 
affinity; the emerging rule for fractional occupancy thus 
facilitates to initiate cross-linking (called lattice formation; this 
term originates from the "lattice" theory of serological reactions, 
in which an antibody precipitates antigens or agglutinates 
antigen-bearing cells^9) of multivalent glycoconjugates by 
lectins at physiological concentrations despite the often low 
affinity of a sugar ligand when free in solution." The fine- 


Gal-1,-2,-3,-7 
(Gal-8) 


d OH OH 
HO - 
OH e Oo- Cer 

SO, 
HO OH Ge i 3 
o Ho- H 
— 0 o 

Héi OH 


H OR 








NHAc 
o Galectins 
(Gal-4,-8) 
HO 
OH 
O TN 
á 20—. o O 
058 iem ESI 
o OR 
NHAc 
Gal-2,-3,-7 
R 
ou HO rd AcHN 
OR 
NHAc 
Gal-3 (Gal-4,-8) 


Figure 5. Illustrations of galectin binders from the class of natural -galactosides and naming of examples of mammalian galectins with preference (galectins 
in parentheses bind with lesser affinity) for a glycan, for example galectin-8 (Gal-8) for 3'-sulfated LacNAc and the hexasaccharide of ganglioside GD1a or 
galectins-1, -2, -3 and -7 (Gal-1, -2, -3 and -7) for the pentasaccharide of ganglioside GM1 (please see Figure 11 for examples of bioactivity of GM1 binding by 
these adhesion/growth-regulatory galectins). 
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structural examination of each glycan will identify the contrib- 
utors to the affinity and selectivity, and they also include non- 
glycan determinants (see below). 

How different types of contact along the glycan structure 
can team up can be elucidated by using a strategic approach: 
the chemical synthesis of the binding partners for lectins makes 





noted above when presenting the DiLacNAc synthesis. Hereby, 
distinct structure-affinity relationships are traced. Sulfation at 
the 3'-position of LacNAc for instance adds ionic recognition to 
binding for two galectins.^? The resulting interactions are 
illustrated in Figure 6 (top part): strong affinity of the CRDs of 
Gal-4 and -8 to 3'-sulfated LacNAc rests upon the combination 


their application in activity assays/structural analysis possible, as ` of this ionic interaction with the typical hydrogen bonding 
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Figure 6. Illustrations of the contact pattern of 3'-sulfated Lac with the N-terminal CRD of human Gal-8 (PDB 3AP6) or Gal-4 (PDB 5DUW) (A, B), of the synthesis 
of a bioactive derivative of the sulfatide headgroup (for details, please see Supporting Information, Scheme 3) (C) and crystal/modeled structures of its binding 
profile with the two CRDs, the water-mediated contact to sphingosine's hydroxyl group highlighted by arrows (D, E). For details, please see Ref. [60]. 
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(interestingly, an ionic interaction governs siglec recognition of 
the negatively charged (sialylated or sulfated) sugar, too, but 
there is an alternative to it described for the -trefoil fold: a 
sulfate can be positioned (by stacking between the B-face of 
Gal and Trp's indole ring) to become acceptor for many H 
bonds; for details on these two cases, please see Supporting 
Information, Figure S5). Interestingly, negatively charged homo- 
galacturonans bind to the Gal-3 CRD in an unconventional 
orientation with the reducing end GalA p-anomer taking the 
position of the non-reducing end galactose residue in lactose, 
and yet maintaining interactions with the conserved tryptophan 
and seven of the most crucial lactose-binding residues, albeit 
with different H-bonding interactions.5" Why Gal-4 and Gal-8 
bind the sulfatide headgroup despite the loss of the second 
sugar unit that contributes to the recognition had long been a 
riddle.5? 

This mystery, that is strong binding of the sulfatide head- 
group despite its truncation to 3-sulfated Gal, has recently been 
solved by using a synthetic mimetic of the crucial part of the 
sulfatide, prepared as shown in Figure 6 (middle), and crystal- 
lography. Intriguingly, recruiting sphingosine's hydroxyl group 
and a water molecule to the interaction with galectin, hereby 
substituting for the 3-OH group of GIcNAc of the disaccharide, 
brings about sufficient extent of bridging (Figure 6 (bottom 
part); for details on synthetic procedure of the sulfatide head- 
group, please see Supporting Information, Scheme $3).' This 
case of molecular compensation of a loss of a carbohydrate, 
here GIcNAc, by a non-glycan part documents the possibility for 
a broader ligand profile for lectins than exclusively binding 
carbohydrates. This is likewise seen in other classes of lectins 
such as C-type lectins, for example by detecting extended 
binding sites to accommodate phosphoglycolipids and espe- 
cially the cord factor (trehalose-6,6'-dimycolate) of mycobacteria 
by the dendritic cell (immuno)activating receptor (DCAR) or the 
macrophage inducible C-type lectin (MINCLE), respectively (for 
details on post-binding outside-in signaling by MINCLE via an 
adaptor molecule, see below).5" Obviously, a look at methods 
how to define ligand recognition is now warranted. 

The analysis of glycan-lectin specificity is performed by a 
wide range of methods (for an overview including information 
on analyzed aspects and limitation, please see Table in 
Ref. [62]). The strategic combination of carbohydrate chemistry 
with lectin assays has considerably fueled the progress on 
profiling glycan specificity. It also is a rich source of information 
on other aspects of the binding process such as the involve- 
ment of solvent rearrangement for affinity. 

The solvent isotope effect measured by running calorimetric 
titrations comparatively in H,O and D,O first for leguminous 
lectins, recently initiated for a human C-type lectin and two 
galectins, indicates an altered solvation within the enthalpically 
driven thermodynamics best seen when using an oligosacchar- 
ide (for the example of the thermodynamics of DiLacNAc- 
galectin binding in both solvents, please see Figure 2, 
bottom) PT! Applying diverse types of biophysical methods to 
study (ga)lectin structures has revealed that a broad-range 
impact on the protein can ensue from ligand association 
beyond the solvation of the contact site in certain proteins. 
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Protein-type-dependent changes of surrounding loop regions 
or of global hydrodynamic properties up to a ligand-induced 
compaction and an increased internal protein dynamics have 
for example been detected in the cases of a collectin and 
human galectins (Supporting Information, Figure S6 shows an 
example). Under such circumstances, crystallization of a 
complex with ligand will not be favored. Responses to ligand 
binding can even be transmitted to other modules beyond the 
CRD, to the neck domain of C-type lectin oligomers or the EGF- 
like domain of E- and P-selectins (see below)! The modules 
are therefore more than inert spacers between the cell surface 
and the CRD. 

In addition to using natural glycans, the scope of experi- 
ments on interaction analysis can be extended when glycan 
derivatives with site-specific substitutions (reporter groups) are 
synthesized. Preparing deoxy- or fluoroderivatives has not only 
enabled the chemical mapping of sites of contact for hydroxyls 
of the ligand (for cases of applications on plant and animal 
lectins; please see Ref.[66] but also opened the door to 
proceed from work with "C-labeled sugars to other isotopes. 
Adding "F (and also the "Se isotope of selenoglycosides) as 
NMR-spectroscopic sensor in interaction studies (for an example 
of a synthetic scheme to produce such a probe and of its recent 
application to analyze bound-state glycan structure(s), please 
see Figure7 and Supporting Information, Scheme S4) is a 
means to map ligand-lectin contacts in solution quantitatively: 
combining short-range heteronuclear ('H,'°F) relay to F (reF) 
with long-range homonuclear ('H,'H) TOCSY transfer enabled to 
determine that the dominant contact via one of the terminal 
residues of the shown trimannoside in the crystal of the lectin- 
ligand complex occurs at a 2:1 ratio between the a1,3- vs a1,6- 
linked moieties in solution Pie"! Evidently, binding modes in 
solution can be accurately dissected by the help of the "F 
sensor. Like the pieces of a puzzle, the data obtained from all 
such studies are further strengthening the postulated wide- 
scale ability of constituents of the cellular glycome to be lectin 
ligands with the noted key-to-lock-style conformer selection in 
the binding process. 

To solidify this fundamental take-home message for tissue 
lectins, let us look at the first steps of a common route of glycan 
biosynthesis, i.e. mucin-type O-glycans (Figure 8). To make our 
point, we have inserted respective information: examples of 
cognate lectins, for example siglecs, are named along with the 
corresponding glycan for the products of core 1/2 synthesis of 
mucin-type O-glycans (Figure 8; for how outside-in signaling is 
elicited by such an interaction, see below). Names of lectins had 
likewise been added to the listing of glycans in the bottom 
panel of Figure 1 such as the binding of the core-fucosylated N- 
glycan stem by dectin-1, while binding of pauci- and oligoman- 
nosidic N-glycans by the macrophage mannose receptor and of 
GIcNAcB1,2Man by the fiver and lymph node sinusoidal 
endothelial cell C-type lectin LSECtin prove this principle to be 
at work already for not fully mature N-glycans. This re- 
lationship from words of the vocabulary to distinct 'readers', 
without and with involving the 'sulfation code', is also emerging 
for glycosaminoglycans.9! The huge combinatorial potential 
offered by epimerase and sulfotransferase activities acting on 
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Figure 7. Overview of the synthesis of the trifluorinated N-glycan core trimannoside (for further information, please see Supporting Information, 

Scheme S4) (A), the crystallographic information on trimannoside binding in two modes (PDB 1RIN) (B) and NMR-spectroscopical information on binding of 
the trifluoro-trimannoside (2F-Man3) by Pisum sativum (pea) agglutinin (C); from left to right: 1D 'H of Manz, 1D 'H 2F-Man3; 2D 'H,?F TOCSYreF correlation 
spectrum; 2D 'H,?F STD TOCSYreF spectra (strips) of 2F-Man3 in the bound state revealing the 2:1 ratio of its two modes of docking via a terminal residue, 
i.e. the a1,3- or the a1,6-linked Man moiety, respectively (for details, please see Ref. [67d]). 


their basic disaccharide units (see Supporting Information, 
Figure S3 for illustrations) is the basis for a large and further 
growing interactome with receptors (for recent compilations of 
‘readers’, please see Refs. [69f,g]). These lines of evidence make 
clear that glycan-protein recognition is a frequently taken route 
for deciphering a glycan's functional meaning, and this also 
includes non-self glycans. 

As alluded to above, the existence of a discriminatory 
glycan signature for bacterial surfaces offers the possibility that 
lectin recognition becomes a means to trace non-self: Galf (and 
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also bacterial ulosonic acids) as such signals are indeed 
efficiently detected by human intelectin via their common 
terminal exocyclic 1,2-diols as structural characteristic? Sim- 
ilarly, animal and fungal six-bladed -propeller-type tectonins 
such as tachylectin-1 have O-methylated glycans (Man/Fuc 
residues with an equatorial hydroxyl neighboring the meth- 
ylated position) as conserved target in frontline defense against 
infection (bacteria) or predators (nematodes), therefore called a 
universal defense armor." Considering this concept of recog- 
nition of an epitope and inhibition of antibody binding by 
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Figure 8. Illustration of routes within mucin-type core 1/2 O-glycan biosynthesis. The functional meaning of these words of the glycan vocabulary is indicated 


by naming of examples for mammalian lectins that bind respective glycans. 


haptens, explained above when dealing with fucoside-inhibit- 
able hemagglutination, also helps to solve the mentioned riddle 
of why the most common sugar, i.e. Glc, is a ligand exclusively 
for just three lectins in the closed environment of the 
endoplasmic reticulum, i.e. calnexin, calreticulin and malectin: 
the "curious absence" of Glc from mature glycoproteins is 
reasonable, because "the efficiency of a recognition surface 
based on p-glucosyl components would be impaired by free D- 
Glc much like haptens interfere with antigen-antibody 
interactions" 7? 

That the ability of a lectin domain can well go beyond 
binding glycans has been substantiated above by illustrating in 
detail the case of sphingosine's hydroxyl group as part of a 
binding partner. Going even further, a separate second site 
used for molecular rendezvous can be presented by a lectin. 
The slime mold lectin discoidin | with its glycan-dependent 
externalization and the fibronectin-like Arg-Gly-Asp motif for 
cell-matrix interaction has provided a role model for developing 
and appreciating the concept of lectin bifunctionality."?! In this 
case, the two sites are operative at different time points, that is 
first during the lectin's externalization and then extracellularly 
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after the export. Are cases already known to see them 
cooperate at the same time? 

Among mammalian lectins, the f-sandwich fold of the 
mentioned galectins with its F- and S-faces equips a lectin to 
bring two types of counterreceptors together, this by using the 
second site for specific protein binding, e.g. for the autophagy 
receptor NDP52 and other organizers of autophagy or media- 
tors of endosomal membrane repair or for the chemokine 
CXCL12 (experimental data and a structural model of the Gal-3 
CRD-chemokine pair are shown in Supporting Information, 
Figure S7)"* By the way, this type of CRD also contains 
molecular switches such as oxidizable Cys or Trp residues or a 
prolyl peptide bond for cis-trans isomerization to swiftly 
regulate lectin activity or quaternary structure (an example of 
resonance splitting by such an isomerization process at the 
prolyl Pro4 peptide bond of human Gal-7 is shown in 
Supporting Information, Figure S8; the phenomenon of two 
conformational states and a shift to the cis-bond has been 
discovered for lectins in the case of concanavalin A and its Ca^* 
-induced isomerization of the Ala207-Asp208 peptide bond, 
later seen for the two rat mannan-binding proteins (MBPs) and 
supposed to have a strong bearing on ligand binding, here at 
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the peptide bond preceding Pro186 (in serum MBP) or Pro191 
(in liver MBP)).79 Deserving particular attention, the lack of a 
signal peptide and thus cytoplasmic biosynthesis predestine 
galectins to this role in intracellular surveillance, because they 
detect otherwise absent N-glycans at this location after 
membrane damage (sensing danger); keeping galectins away 
from the classical route of secretion also precludes their N- 
glycosylation in the ER that has been shown to impair lattice 
formation in the case of the engineered version human Gal-1 
that enters the ER by having been tagged by a signal 
peptide. Hie 

In summary, the type of the CRD is the common denomi- 
nator of a lectin family. Having identified sequence signatures, 
searches for homology by scouring genomes accomplished to 
reach the full-scale description of lectin families. This sequence 
mapping disclosed such a wide range of diversity of a structural 
variable that it was puzzling at first. Now, it is increasingly 
making sense to give the vocabulary functional meaning by 
lectin recognition. Speaking of the modularity of lectin 
architecture, the spatial how of pair building between a lectin 
and its glycoconjugate counterreceptor is the salient factor 
toward triggering post-binding processes: they perform the 
actual translation of a ‘word’ of the glycome vocabulary into 
function(s). Thus, lectin design (together with glycan structure, 





siglecs 14-16, 
dectin-2, MINCLE 


oyster galectin 









Gal-4,-8,-9,-12 





multivalency and type of binding partner) contributes to shape 
functionality. Consequently, if glycans are functional counter- 
receptors of tissue lectins, then the number of ways to present 
a CRD must be large, exactly as we have seen this to be the 
case for the folds with glycan-binding capacity in our Gallery of 
Lectins. Figure 9 gives an impression that it is. 

A CRD can stand alone or it becomes a part of a molecular 
puzzle by the association of modules to form homo- or hetero- 
oligomers, even coming together with other types of domains, 
covalently or non-covalently (Figure 9). Design diversity is most 
impressively illustrated by C-type lectins."" Bioactivity as anti- 
microbial protein has already been seen on the level of a C-type 
CRD in the case of murine Reglll (the human ortholog is called 
hepatointestinal pancreatic/pancreatis-associated protein (HIP/ 
PAP))."? Joining different types of modules is ideal as means to 
create tools for many purposes. It is essential to allow 
aggregation for sensing glycans presented in clusters, the origin 
of the glycoside cluster effect (it is defined as “binding affinity 
enhancement exhibited by a multivalent carbohydrate ligand 
over and beyond that expected from the concentration increase 
resulting from its multivalency", in the case of the trimeric 
hepatic C-type lectin and mono-, bi- and trivalent oligosacchar- 
ides yielding a geometrical (logarithmic) increase in affinity 
from a numerical increase in valency’). The covalent conju- 


GRP, REG proteins 


Gal-3 


collectins 


BZ 






Gal-1,-2,-7, 
GRIFIN 


Figure 9. Illustration of examples of lectin design starting with a single CRD that can have a short or long tail (the latter for self-association). In clockwise manner, 
lectins with modules for covalent subunit association (via disulfide bonds), for non-covalent and linker-mediated modes of CRD associations and for building a 
puzzle-like architecture with intracellular domains for outside-in signaling are displayed. Abbreviations are given to define distinct lectins for each type of shown 
architecture (Reproduced with permission from Ref. [98c] Copyright 2015 Elsevier; for further information, please see Refs. [77b,g-k, 82f, 83c,e, 86i, 98]). 
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gation of CRDs fabricates the tandem-repeat design, it produces 
molecular tentacles when using other modules to present the 
CRD on their tips on the surface spatially readily accessible for 
making crucial contacts in cell-cell bridging, and it adds a place 
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5. From the Vocabulary and the Readers to the 
Dictionary of the Sugar Code 


for site-specific phosphorylation/association of an intracellular 
adaptor in post-binding (outside-in) signaling (Figure 9). That 
the extracellular matrix adopts its highly ordered structural 
organization is in part made possible by lecticans, which are 


glue-like multipurpose tools with 


Historically, the enormous potential of permutations of CRD 
specificity with the diversity of modular design has first been 
realized in studies with plant lectins (agglutinins). Explicitly, the 
switch from natural tetra- to bivalency by chemical treatment 
(succinylation or acetylation was shown to reduce cap 
formation on murine spleen cells: the type of quaternary 
structure matters.®" So the take-home message is the modular 
design of the lectin is a factor that underlies the intriguing 
selectivity and specificity of pairing of a lectin with its counter- 
receptor(s). This process is intimately dependent on the context, 
giving lectins, glycans and glycoconjugates the fundamental 
ability to become multi-purpose tools in vivo. This raises our 
curiosity to learn about actual functionality of this interplay, its 


a C-type CRD.” 


currently known spectrum and specific cases. 


A dictionary of the sugar language is supposed to correlate 
structural aspects of a glycan and of a lectin with a cellular 
function. The current status of knowledge on lectin functions is 
summarized in Supporting Information, Table S1 and listing 
general terms there calls for illustrating a specific case: Fig- 
ure 10 presents a route from upregulated lectin expression to 
manifestation of a common disease with large socioeconomic 
impact, i.e. osteoarthritis. 

The examples of how glycans and lectins cooperate in the 
already mentioned processes of leukocyte adhesion during 
inflammation (by selectins) or postal-code-like routing of 
distinct glycoproteins similarly supply information on the 
underlying intimate interplay and are thus outlined here. The 
mentioned high k,,-rates of the association of negatively 
charged sugars such as a sialyl Le* epitope (most active with 6- 
sulfation of the GICNAc moiety) to a (selectin) CRD presented at 
the tip of the tentacle-like design that hereby reaches out into 
the bloodstream (see Figure 8) will make nearly immediate 
contact to this counterreceptor to slow down cells to a rolling 
on the endothelium to let integrins tighten the grip in the next 
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Figure 10. 


Illustration of the route of galectin-driven osteoarthritis pathogenesis by upregulation of pro-degradative/-inflammatory effectors such as 


interleukins (IL) and matrix metalloproteinases (MMPs) that starts with dysregulated galectin expression. Their secretion, cell surface binding and the triggered 
outside-in signaling to reprogram IL/MMP gene expression via a downstream effector, i.e. the transcription factor NK-«B, lead to matrix degradation in vitro 


and in vivo (for details, please see Ref. [99]). 
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step; triantennary N-glycans after a2,3-desialylation (for the tom)) and LacNAc-terminated N-glycans of cargo glycoproteins 
asialoglycoprotein receptor), hybrid- or high-mannose-type N- together with the sulfatide headgroup (for the heterobivalent 
glycans with Man-6-phosphate (for P-type lectins), the A:  galectin-4) are postal codes written in glycans for lectin-specific 
sulfated GalNAcB1,4GIcNAcB1,R (LacdiNAc) unit of N-glycans ` routing and delivery.5%79°*! 

(for the contact site of the f-trefoil domain of the (macrophage) 


That post-binding signaling triggered by glycan-lectin 
mannose receptor, see Supporting Information, Figure S5 (bot- 


pairing becomes pathophysiologically relevant has been docu- 
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Figure 11. Effect of wild-type and of engineered human galectins on neuroblastoma cell (SK-CN—MC) growth. Galectin architecture, microphotographs of 
representative cultures and a bar graph of cell numbers are shown. Galectins are tested at 100 uig/mL (* 10 ug/mL), wild-type Gal-3 and its Gal-3NT/1 variant 


are used in 10fold excess in the mixtures with Gal-1 (for details on proteins, impact of architecture on lattice formation by testing synthetic glycoclusters and 
assay conditions, please see Refs. [95c,d, 100]). 
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mented in various cases, and the processes leading to and, to reach a high level of an effect, lectin availability at the 


progression of osteoarthritis driven by galectins shown in 
Figure 10 give an instructive example. In order to avoid falling 
into the trap of thinking that lectin activities can simply be 
extrapolated, the context-dependent nature of response pat- 
terns definitely precludes extrapolations (we all know so well 
that the meaning of a word in any language can be contingent 
on the context of the sentence). Literally, the same lectin can 
hereby elicit opposite effects, for example pro- or anti- 
inflammatory or -tumoral activities. This inherent potential for 
duality warrants attention when considering pharmacological 
targeting of tissue lectins without reaching site-specific delivery. 
The recurring theme thus is that context matters. 

Next, the following specific cases of O-glycans shown in 
Figure 8 underpin the principle that adding a symbol repro- 
grams a glycan word's meaning profoundly: the addition of a 
sialic acid to mucin-type O-glycans can establish the basis for 
association to a member of the siglec family. Binding the 
sialylated core 1 disaccharide (also called the Thomsen- 
Friedenreich (TF) antigen or CD176) by siglec-7 converts 
monocytes into tumor-associated macrophages, binding the 
disialyl core 1 tetrasaccharide when presented clustered on 
leukemia cells by the glycoprotein CD43 primes the intracellular 
immunoreceptor tyrosine-based inhibitory motif of siglec-7 to 
convey negative signaling to reduce killing activity (on the 
other hand, the high-affinity interplay between 6’-sulfo sialyl Le 
and siglec-8 is an example for a self-glycan code-guided return 
to homeostasis after inflammation by depleting eosinophils 
from tissues), whereas sialyl T, (CD175s) binding by siglec-15 
(T4, O-linked a-GalNAc (n — nouvelle), is not active) transmits a 
positive signal via association with DNAX-activating protein of 
12 kDa (DAP12) in the transmembrane region to increase spleen 
tyrosine kinase (Syk) activity and hereby transforming growth 
factor (TGF)- secretion by tumor-associated macrophages.*? A 
schematic drawing on the special route of signal transfer from 
the out- to the inside by association with adaptor molecules 
(DAP12 or the F. receptor y chain (F.Ry)) that contain an 
immunoreceptor tyrosine activation-like motif (ITAM)) is shown 
in Figure 9. Not surprising, this mechanism is also operative for 
several C-type lectins like the already mentioned MINCLE (with 
E Rui Pi Bidirectional (cis and trans) signaling between axon and 
myelin, to give a further example, is exerted by pairing of the 
sialyl core 1 trisaccharide present on gangliosides such as GD1a 
or GT1b with myelin-associated glycoprotein (siglec-4a), which 
then appears to favor lectin dimerization and to associate the 
cytoplasmic non-receptor tyrosine kinase Fyn as relay station 
when alone or as a heterotetramer with the dynein light 
chain äi 

Research over decades has shown that cellular activation or 
differentiation, inflammation or the activity status of distinct 
genes such as oncogenes or tumor suppressor genes induce a 
reprogramming of aspects of glycosylation. For example, the 
extent of sialylation or of 81,6-branching and the occurrence of 
LacNAc repeats in N- and O-glycans, of sialyl Lewis" production 
or of conversion of ganglioside GD1a into the galectin counter- 
receptor GM1 are modulated. The concept of the sugar code 
predicts the possibility of an in situ interplay with tissue lectins, 
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right place could be regulated in a coordinated manner. This 
has already been revealed to occur for selectins and galectins-1, 
-3 and -7.5959 Serving as a proof-of-principle case, anoikis 
induction in pancreas cancer in vitro by the tumor suppressor 
p16"*? is based on orchestrating a downregulation of sialic 
acid biosynthesis (at enzyme (NANS and GNE) expression level) 
and hereby of N-glycan o2,6-sialylation (that precludes galectin 
binding by occupying the OH group of the hydroxylmethyl of 
Gal, a major contact point) with an upregulation of both 
galectin-1 and its glycoprotein counterreceptor a.(,-integrin, 
fully suited for integrin cross-linking without «2,6-sialylation, so 
that focal adhesion kinase and, further downstream, caspase-8 
activation will drive these tumor cells into death Di! As noted 
above and worth to be emphasized, the glycan profile and 
structure, its local density and context-specific mode of 
presentation are the parameters for enabling a glycoconjugate 
to become the local counterreceptor for a lectin,®” its nature 
such as an integrin and the type of lectin architecture then 
determining the post-binding effects. Already a perturbation of 
the glycan profile by a single N-glycan, as shown in the gain-of- 
glycan Thr160Asn mutant of the interferon-y receptor 2 subunit, 
can have a tremendous impact, here it partitions the glyco- 
protein differently among compartments via galectin binding 
and thus impairs receptor functionality.59 

Noteworthy in this context, adding a sugar to a certain 
acceptor can have a second consequence besides being a part 
of the region for lectin binding. Such a shift in the glycome can 
also make its presence felt by precluding synthesis of a glycan 
that is a lectin ligand. Figure 8 guides to this insight. The a2,6- 
sialylation of the T, epitope, for example, also abrogates the 
generation of core 1/2 glycans, the presence of sialic acid 
preventing any glycosyltransferase from accepting sT, as 
substrate (Figure 8). Since the core 1 disaccharide, is a Gal-3 
binder and assumedly involved in cell contacts in the metastatic 
cascade," the respective enzymatic activity (e.g. ST6GalNAc-ll 
or IV) has been discussed as potential metastasis suppressor, by 
shifting product presence away from T(F) to T,.9? Alternatively, 
the sialic acid can occupy a crucial contact to block galectin 
binding: «2,6-sialylation does so at N-glycan termini (which is 
the case to avoid Gal-1-dependent induction of anoikis in tumor 
cells), in turn generating siglec binders (see above). Hereby, the 
rule for intimate correspondence between the glycan vocabu- 
lary and the meaning of its words is further demonstrated, 
reinforcing the case for a fundamental principle and the 
feasibility to set up a dictionary for the glycan vocabulary. 

This principle also works wonders on a common acceptor in 
N-glycans, that is a GlcNAc-terminated branch. Its alternative 
usage as substrate leads to words with separate meanings 
along the different routes. Briefly, when we look at the 
mentioned case of the generation of the LacdiNAc platform by 
GalNAc (not Gal) addition to GICNAc in an N-glycan branch end, 
the Pro-Leu-Arg-Ser-Lys-Lys recognition determinant of the 
glycoprotein in the vicinity of the N-glycosylation site accounts 
for already noted target specificity of this process and then 4’- 
sulfation follows to yield the mentioned routing (postal-code- 
like) signal, whereas a2,6-sialylation or «1,3-fucosylation of the 
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acceptor in other glycoproteins are possible?" Overall, inter- 
preting glycome representation and shifts between usage of 
the vocabulary with a dictionary of the sugar language at hand 
will let more discoveries appear to be in store by respective 
investigations. 

The implied relevance of the modular architecture of lectins 
shown in Figure 9 has already been revealed in diverse ways 
physiologically, as the entries in Table S1 attest. Clearly, CRDs 
alone would not be able to create such a large panel of 
bioactivities. The catch-bond phenomenon to let the strength 
of cell binding counterintuitively increase in sheer stress would 
be impossible, if ligand binding and the following change in 
orientation of the modular arrangement would not prepare a 
selectin (or a bacterial adhesin) to withstand even the influence 
of external force in its function as molecular anchor with clinical 
significance in defense (or in infection).?? Notably, the way 
bacterial and fungal adhesins as well as viral haemagglutinins 
convert host glycans to docking sites for infection is a dark side 
of sugar coding.?? Associations of uropathogenic E coli or 
Fusobacterium nucleatum mediated by the O-glycan core 1 
disaccharide and of Helicobacter pylori by O-glycan-presented 
Lewis epitopes give these ‘words’ further meanings to be added 
to the dictionary.P? 

What Figure 9 teaches us beyond documenting Nature's 
ingenuity in protein design is the large uncharted territory 
ahead of us to unveil the full significance of the known types of 
modular architecture and to unravel activity profiles of new 
types of design. The latter challenge is addressed by applying 
rational protein engineering to find answers: the merit of this 
approach is documented here with experimental data. By using 
galectins as proof-of-principle models in cell growth assays, first 
functional antagonism is seen between human Gal-1 and -3 
(Figure 11; for details, please see legend). They compete for the 
same counterreceptor but differ in modular design so that the 
architecture of the lattice will look differently. Members of the 
same lectin family can thus interfere with each other in a 
certain cellular context, in contrast to the cooperation seen in 
osteoarthritis pathogenesis above (Figure 10). CRD switching by 
engineering demonstrates the importance of protein architec- 
ture (Figure 11). 

Next, increased activity of an engineered tetramer relative 
to the homodimer is revealed (Figure 11). This result lets us 
wonder why no human galectin has adopted this type of 
modular design (the answer is that the tetramer's high affinity 
would sense already low-level ganglioside GM1 presence, 
making its assignment as molecular switch impossible). Finally, 
the potential for covalently linked heterodimer variants to exert 
higher activity than wild-type proteins is sketched, supporting 
physiological significance of heterodimers (Figure 11; an exam- 
ple for occurrence of galectin heterodimers in mixtures of Gal-7 
together with the galectin-3 CRD by CRD switching is given in 
Supporting Information, Figure S9). These data embody the 
attractive perspectives for obtaining i) further understanding of 
structure-function relationships and ii) new reagents for bio- 
medical applications by tinkering with a toolbox of human 
CRDs and other modules (for information on concept and on 
details, please see Ref. [95]). 
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6. Conclusions and Perspectives 


A close inspection of the properties of sugars indeed proves 
that they are ideal symbols for a code. Cooperation by writers, 
editors and erasers establishes a large vocabulary by using 
these letters. Molecular complementarity by combinations of 
coordination, hydrogen and ionic bonding, C—H/z-interaction 
and stacking underlies the reading. Like glycans, sugar 
receptors (lectins) come in many forms, more than a dozen 
protein folds endowing sugar binding to the proteins of the 
lectin superfamily. The sheer size of sequence changes among 
CRDs as well as of the diversity of quaternary structures and of 
types of modular design equips the lectin toolbox with 
enormous possibilities for selectively interacting with cellular 
glycoconjugates and for eliciting meaningful post-binding 
events, the equivalent of the translation of a message. Hereby, 
the vocabulary is turned into a dictionary of the sugar code. 
Notably, a glycan ‘word’ can have different meanings depend- 
ing on the context, as some ambiguity occurs in a language. 
The emerging insights, to keep this part short and sweet, 
are sure to guide us to novel hypotheses and to a more 
thorough understanding of cellular systems. For example, 
powered by hypothesis-driven tinkering with glycan or lectin 
features, rational engineering can spawn new tools for 
applications, e.g. biomedically active lectin variants with non- 
natural architecture as platform for CRD presentation. These 
data also let us realize that and how the three alphabets of life 
are going hand in hand in the flow of biological information. 
Each is suited to meet special needs for life, each is a code 
system. Compelling evidence is thus available to let the term 
‘sugar code’ reach common parlance. Turning back to the 
introductory statement by N. Sharon, he concluded his lecture 
by stating that it is his hope "that | have convinced you why 
this field is of such great importance, and why it is so 


exciting"! 
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REVIEWS 





Information coding by sugars: 
Letting symbols convey information 
comes in many forms such as the OR 
code. We explain that sugars are an 
ideal alphabet of life. They form an 
unsurpassed size of bio-vocabulary. It 
is read and translated by a matching 
diversity of sugar receptors (lectins) 
so that establishing a dictionary for 
the glyco-vocabulary is now in 
progress. The given QR code directs 
you to our review on the sugar code. 
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HO OH 
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HO -OH 
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